DNA barcoding for the assessment of marine and coastal fish diversity from the Coast of Mozambique

The ichthyological provinces of Mozambique are understudied hotspots of global fish diversity. In this study, we applied DNA barcoding to identify the composition of the fish fauna from the coast of Mozambique. A total of 143 species belonging to 104 genera, 59 families, and 30 orders were identified. The overall K2P distance of the COI sequences within species ranged from 0.00% to 1.51%, while interspecific distances ranged from 3.64% to 24.49%. Moreover, the study revealed 15 threatened species according to the IUCN Red List of Threatened Species, with elasmobranchs being the most represented group. Additionally, the study also uncovered four new species that were not previously recorded in this geographic area, including Boleophthalmus dussumieri, Maculabatis gerrardi, Hippocampus kelloggi, and Lethrinus miniatus. This study represents the first instance of utilizing molecular references to explore the fish fauna along the Mozambican coast. Our results indicate that DNA barcoding is a dependable technique for the identification and delineation of fish species in the waters of Mozambique. The DNA barcoding library established in this research will be an invaluable asset for advancing the understanding of fish diversity and guiding future conservation initiatives.

If the data are held or will be held in a public repository, include URLs, accession numbers or DOIs.If this information will only be available after acceptance, indicate this by ticking the box below.For example: All XXX files are available from the XXX database (accession number(s) XXX, XXX.).

•
If the data are all contained within the manuscript and/or Supporting Information files, enter the following: All relevant data are within the manuscript and its Supporting Information files.
• If neither of these applies but you are able to provide details of access elsewhere, with or without limitations, please do so.For example: Data cannot be shared publicly because of [XXX].Data are available from the XXX Institutional Data Access / Ethics Committee (contact via XXX) for researchers who meet the criteria for access to confidential data.
The data underlying the results presented in the study are available from (include the name of the third party • All relevant data are within the manuscript and its Supporting Information files.

Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation
and contact information or URL).This text is appropriate if the data are owned by a third party and authors do not have permission to share the data.

• * typeset
Additional data availability information: Tick here if the URLs/accession numbers/DOIs will be available only after acceptance of the manuscript for publication so that we can ensure their inclusion before publication.
Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation identification and delineation of fish species in the waters of Mozambique.The DNA barcoding library established in this research will be an invaluable asset for advancing the understanding of fish diversity and guiding future conservation initiatives.

Introduction
The Mozambique Channel comprises an arm of the Indian Ocean located between the Southeast African countries of Madagascar and Mozambique, stretching 1,600 km along the coast [1].It is recognized as an important diversity hotspot due to a variety of coastal ecosystems that distinguish it from other Western Indian Ocean (WIO) biogeographic provinces.[2,3].These systems provide a range of habitats for both animals and plants, making the region abundant in aquatic species [4,5,6].The diverse habitats include a large rocky coastline in the north with extensive coral reefs, a wider platform in the central part with river outflows and weaker ocean currents that form sandy banks, estuaries, and deltas, and a southern coast characterized by wide beaches, bays, and seagrass beds with numerous endemic species [7,8,9].
Investigations into fish species diversity on the west coast of Africa remain in their early stages, largely due to a scarcity of both traditional taxonomists and those employing molecular tools for taxonomic inference within various fish groups [10].Accurate classification and identification of fish species are important not only for taxonomists but also for various types of research, including fisheries, natural resource surveys, forensic studies, the discovery of cryptic species, and the identification of species not previously known and their conservation status within a specific biogeographic region.[11,12,13,14,15].The impacts of lacking of knowledge on ichthyofauna may have even greater repercussions because research focused on fishing activity can be seriously compromised [16].
Historically, surveys of fish fauna off the coast of Mozambique have primarily relied on traditional methods of identification, which use morphological features and meristic counts to distinguish and define fish species.[17,18,19].However, due to subtle morphological differences, complex evolutionary patterns in certain groups, and the existence of cryptic species, these methodologies may not always overcome the challenges of accurately identifying species, resulting in an underestimation of the true diversity of ichthyofauna [20,21,22].
These concerns are particularly significant for international organizations focused on biodiversity conservation, which have traditionally relied on the rigid concept of species.This concept has recently been widely challenged by the more comprehensive and integrated approaches adopted by modern taxonomy for classifying living organisms.[23,24,25].Although there is consensus that species represent the smallest independent evolutionary unit, their recognition remains a subject of debate [26,27], potentially impacting conservation efforts.Furthermore, morphological identification of fish can prove challenging task if not conducted by specialists, leading to inaccurate identifications [28,29].Additionally, previous genetic studies have shown that molecular identification may not align with identifications based on morphological characteristics in some instances [30,31].This highlights the importance of carrying out a systematic and comprehensive inventory of ichthyofauna using standardized DNA-based methods.
In response to the limitations of classical taxonomy, DNA barcoding has emerged as a widely adopted research method for accurately identifying species [32,33,34].This technique, which relies on the diversity of the Cytochrome C Oxidase I (COI) gene region within mitochondrial DNA, serves as an alternative to traditional taxonomic approaches.
DNA barcoding has proven crucial in enhancing the accuracy of species identification and is especially valuable for differentiating between diverse and poorly understood flora and fauna that necessitate species delimitation [32,33,34,35,36,37,38].
Large-scale investigations of ichthyofaunas across various ecosystems have demonstrated that utilizing the cytochrome oxidase I gene captures a significant proportion of known diversity [40,41,42].However, it has also helped to reveal previously unknown diversity [43,44,45,46].This method is successful due to the availability of reliable DNA barcode reference libraries in the BOLD Systems (Barcode of Life Data Systems) and GenBank [39].New DNA barcodes can be analysed with available data to check for potential taxonomic conflicts and improve taxonomic resolution [47].
Considering the remarkable potential of DNA barcoding to identify fish species and considering the still unknown fish assemblage in the coast of Mozambique, the aim of this study is to employ this approach to establish a DNA barcoding reference database for the composition of the fish fauna off the coast of Mozambique.This will have significant implications for our understanding of diversity and could contribute to future conservation efforts for marine species in Mozambique.

Sampling area and specimen collection
The Mozambican coast is in Eastern Africa (Fig 1 ), spanning over 2.700 km of coastline along the Indian Ocean, second largest African coast [48,49].The coastal region showcases a range of diverse landscapes, including sandy beaches, dunes, forests, mangrove swamps, seagrass, and coral reefs.Three distinct ecological zones can be found along the coast, including the Delagoa Basin in the south, the highly productive and rich Sofala Bank in the central region, and the São Lazaro Bank further north [49,50].
Sampling was conducted randomly across all coastal provinces of the country from December 2019 to April 2022, considering both dry and rainy seasons (Fig 1).
Different fishing gear (trawls, surface, and bottom gillnets) commonly used in the country's artisanal fisheries were used for specimen collection.The collected samples were stored in ice boxes and transported to the sorting site, where a substantial portion of the specimens was photographed, and tissues were removed.The specimens were identified to the lowest taxonomic level based on the nomenclature used by the vendors and using species identification keys specific to South-eastern Africa [51,52].
Approximately 50 mg of muscle tissue was collected from each specimen and preserved in 1.5 ml Eppendorf-type microtubes filled with 96% ethanol.The samples were stored in a freezer set at -20 ºC.After the initial collection of tissues, the specimens were transported to the Evolution Laboratory at the Institute of Coastal Studies of the Federal University of Para, located in Bragança, Pará, Brazil, for the purpose of generating genetic data.an initial denaturation step at 94°C for 3 minutes, followed by 35 cycles of denaturation at 95°C for 1 minute, annealing at 50°C-58°C for 45 seconds, and extension at 72°C for 45 seconds.The process was concluded with a final extension at 72°C for 5 minutes.The positive reactions were then sequenced using an ABI 3500 automatic sequencer (Applied Biosystems).

Molecular data analysis.
The final sequences were aligned using ClustalW [53], implemented in GENEIOUS 9.0.5 (https://www.geneious.com),and then translated into amino acids to check for potential stop codons using the MEGA 11 software.All the generated sequences were submitted to BOLD database.The high-quality sequences were compared using the National Canter for Biotechnology Information (NCBI) BLAST search engine and the Barcode of Life Data Systems (BOLD) database.The GBIF (Global Biodiversity Information Facility) repositor was used to compile the taxonomic information of the species.After analysing the sequences, parameters such as the presence of barcode gap, sequence length, GC content, and intra/inter genetic distances within and between families, genera, species, and minimum distances to the nearest neighbour were calculated using tools in the BOLD Systems platform using Kimura's two-parameter model and bootstrap of 1000 replicates [54].Phylogenetic trees were constructed in MEGA 11 using the NJ method based on the Kimura 2-Parameter model with 1000 bootstrap pseudoreplicates [54,55] and the online tool Interactive Tree of Life (iTOL) [56] were used to visualize and edit the tree.
Analysis of Barcode Index Numbers (BIN) [57] was conducted for species delimitation, based on uncorrected p-distances, which provide a single BIN for each Operational Taxonomic Unit (OTU) obtained from the COI sequences.The analysis of BINs reveals the maximum intra-specific distance and the minimum inter-specific distance that overlap and allow for species identification.The aligned sequences, primer pairs, trace files, taxonomic information, and collection data were deposited in the Barcode of Life Data Systems under the project MOZFH [57].

DNA barcoding results.
In The species composition was represented by Carangidae (16 species), Sparidae (10 species), Serranidae (9 species), Haemulidae and Lutjanidae (7 species each), Scombridae and Mugilidae (6 species each).The remaining families were composed of fewer than 5 species (Table 1).The sequences generated in this study were all submitted to the BOLD Systems and the species list details are provided in (S1 Table ).
The final analysed database had no insertion, deletion, or stop codon, indicating that it accurately represents the mitochondrial COI fragment.Sequences with poor quality DNA were excluded from the analysis.The nucleotide content showed the following average frequency: G=17.77%,C= 28.74%, A=23.88% and T=29.61% with an average GC content of 46.51%.
The NJ tree included 143 distinct OTUs with high bootstrap support of 70/99 (Fig 2) . The specimens belonging to the same species exhibited clustering based on both morphological characteristics and genetic similarities, demonstrating a coherent relationship between the two aspects.No evidence of taxonomic deviation was found in any of the species' group, indicating that the DNA barcoding approach accurately identified the analysed species.

Genetic distances and Barcoding gap.
The Kimura-2-parameter model (K2P) was used to calculate the genetic distance within and between species and the presence of barcode gap.The overall K2P distance of COI sequences are shown in (Table 2).Intraspecific distances ranged from 0.00% to 1.51% (within Species), while interspecific distances ranged from 3.64% to 23.43% (within Genus), which is greater than the minimum intraspecific distance of 2%, the threshold for the identification of fish species by DNA barcoding [58,59] (Table 2 and Fig 3 A and B).This distance increased with increasing taxonomic level, with distance within families varying from a minimum of 8.27 to a maximum of 25.52 (within Family).The genetic distance of all species did not exceed the 2%, with the highest intraspecific distance belonging to Acanthurus mata OTU-4, with a genetic distance of 1.51% and a barcode gap.The remaining species had distances less than 1%.The minimum nearest neighbor genetic distance was 3.64% from Epinephelus coioides OTU-42 to Epinephelus malabaricus OTU-41, and the maximum was detected between Hippocampus kelloggi OTU-52 and Carcharhinus leucas OTU-19, with a genetic distance of 22.08%.All details regarding the species distance are summarized on the (S 3 Table ).

Revealing hidden fish diversity.
Four species were unexpectedly detected for the first time in this geographical area, whose presence was not expected, including: Boleophthamus dussumieri, Maculabatis gerrardi, Hippocampus kelloggi and Lethrinus miniatus.The specie Boleophthalmus dussumieri is known to be endemic from the Persian Gulf region down to the Indian region [3], and Lethrinus miniatus from the Western Pacific [60].For B.
dussumieri, our molecular analysis has shown that there is more than one lineage that has not been discussed previously.On the other hand, the data also showed the presence of the Great Seahorse (Hippocampus kellogi) in the Central Province of Mozambique.This specie is listed as Vulnerable in the IUCN Red List, and its occurrence is known only from the Indo-Pacific region of Southeast Asia.
The other significant finding from our study was the presence of one of the most invasive fish species, the silver carp (Hypophthalmichthys molitrix), which is native to East Asia.This species is classified as "Near Threatened" and has spread globally, including in the Limpopo River in southern Mozambique.Rhinoptera sp was also recorded in this study.However, the occurrence of this species in Mozambique is not officially listed in the global fisheries database.Our molecular data also allowed us to detect for the first time, the presence of Pangasius djambal [61], which is known from Southeast Asia, Dentex macrophthalmus, which is distributed from the Mediterranean Sea to the eastern Atlantic [62], and Fistularia petimba, which is found in the western Atlantic [63].

Keystones species and Conservation status.
Sharks, stingrays, and seahorses are the most threatened species in the world due to their importance in commercial trade and use for various purposes, and therefore, the conservation and management of these species must be carefully ensured.In our study, we recorded fourteen endangered species (

Discussion
This study has demonstrated the utility of DNA barcoding to supplement morphological identification.These results are particularly important because, while the DNA barcoding method has been applied globally to identify fish species, this study provides, for the first time, an overview of the marine fish diversity in the coastal region of Mozambique.This will make a significant contribution to our understanding of the understudied diversity in the Indo-West Pacific biogeographic region.
This study substantially improves our comprehension of fish species and families inhabiting the Mozambique Coast.The prominent occurrence of species within particular families is attributable to both the availability of specimens in the examined areas and the presence of economically important fish species.For instance, the Labridae family is especially abundant in southern Mozambique [64,65].In terms of family representation, this research ranks fourth, identifying 60 distinct families.This number is lower than those reported in previous studies, which recorded 90 and 94 families respectively, all using morphological identification methods [ 65].Since morphological identification is prone to errors that can lead to misclassification of species, this study presents the most accurate and reliable assessment of fish diversity along Mozambique's coastline to date.
Accurate DNA barcoding species identification relies on their intra-and interspecific genetic distances.Consequently, closely related species may be readily detected in nearby geographic regions of occurrence.To be considered as species divergence, intraspecific genetic distance values must exceed 2%, which is the established threshold for species delimitation [42].However, no specific threshold has been defined for interspecific or interfamily genetic distances.In the current study, the mean K2P genetic distances for the barcode region of the COI gene at the species, genera, and family levels are: 0.21±0.00%,12.57±0.01%,and 17.29±0.00%,respectively (Table 2).The average of K2P genetic distances among species, genera, and families in this study was analogous to findings reported in other studies [66,32,67].However, the minimum interspecific distance observed was substantially greater than that reported in [ 68,69,70] for fish in the China Sea, which can be ascribed to the increased taxonomic diversity of fish species in the present study.
In the present study, six species had low genetic distance from each other and clustered clustered closely and two of them were not identified at specie level.The nearest neighbour of Rhynchobatus australiae is R. djadensis with 3.72%, Oreochromis mossambicus X O. niloticus with 3.68%, Epinephelus malabaricus X E. coioides, Selar crumenophthalmus X Selar sp and Caesio caerulaurea X Caeseo sp (Figure 2).These species, which we could not identify at species level, suggest that they belong to the same genus.Therefore, it is very likely that the sequence of these two morphospecies belongs to the same genus [71].
The barcode gap method was highly effective in distinguishing and delimiting all possible species.Despite the identification of 143 species represented by their respective operational taxonomic units (OTUs), some of these OTUs had remarkably low intraspecific distance and were treated as single species, yet were grouped into separate OTUs, such as Molgarda sp1 (OTU-72), Molgarda sp2 (OTU-73), and Selar sp.(OTU-138), and Selar crumenophthalmus (OTU-139) (S2 Table ).This scenario arises when there is insufficient data in public databases to match the analysed specimens.

Revealing hidden fish diversity.
The Indo-West Pacific region is renowned for its high diversity of fish species; however, there have been limited or inadequate studies conducted on the composition of fish fauna in the western Indian Ocean.This lack of information can result in an overestimation or underrepresentation of the biodiversity in this region.As an example, this study provides valuable data on the enigmatic presence of the amphibious fish species B. dussumieri, whose distribution was previously known to be restricted to the coast of Iran, across the Arabian Sea, and into the Indian region [72,73].
This study also reports the presence of the stingray species Maculabatis gerrardi for the first time.This species is recognized as a complex of species within the Indo Pacific region, with three distinct lineages distributed along its geographic range [74].
Through the analysis of our COI gene sample and available COI gene sequences from GenBank and the BOLD System, our samples were found to cluster with M. gerrardi samples from Kwazulu-Natal (JF493650.1,JF493649.Our data has demonstrated the occurrence of Hippocampus kelloggi, which significantly expands the known distribution range of the species, previously reported in the regions of China, India, Indonesia, Japan, Malaysia, Philippines, Thailand, Vietnam, and Tanzania [75].The study confirms the presence of silver carp Hypophthalmichthys molitrix in the Limpopo estuary of southern Mozambique, which is widely regarded as one of the most introduced fish species worldwide [76].The exact circumstances surrounding its introduction to Mozambique for aquaculture purposes, remain unclear due to the inexistence in the list of produced specie in Aquaculture.It is possible that the presence of this species in the region may be attributed to flood runoff from South Africa [77].
The presence of Lethrinus miniatus in the western Indian Ocean is surprising, as it is generally found in the Western Pacific, the Ryukyu Islands, eastern Philippines, northern Australia, and New Caledonia [78].Previous records of its occurrence outside of this range are misidentifications with another species of the same genus, Lethrinus olivaceus.However, our molecular data, when compared with studies conducted by [79,80,81], affirms the existence of Lethrinus miniatus in Mozambican waters.
Our study also documented the presence of Fistularia petimba.This species has been reported in several studies as spreading into new areas, including the archipelago of the Azores, Portugal [82], the Aegean Sea [83], the southern Iberian Peninsula [84], the Mediterranean Sea [85,86], and the Syrian coast [87].Notably, Fistularia petimba has also been recorded in the Gulf of Carpentaria in Australia [88].Up until now, there have been no previous confirmed reports of its presence in the waters of Mozambique.
In summary, we provide strong evidence of the effectiveness of using DNA barcoding to accurately discriminate and identify many marine and coastal fish species examined thus far.The COI divergence patterns corresponded with morphologically recognized species; however, in some cases, the molecular data revealed previously undetected genetic divergence within a group and instances of low interspecific variation.
The presence of cryptic taxa is relatively common among marine animals, emphasizing the need to consider the possibility of neglected diversity and the occurrence of species complexes.The more comprehensive the barcode library, the more useful it will be for the Barcode of Life Initiative [89].This study represents a significant contribution towards consolidating DNA barcoding as a global system for identifying life forms and enhancing our understanding of the genetic diversity of Mozambican marine fish.
However, even among experienced taxonomists, consistent application of species names remains a challenge, especially when cryptic diversity is present.This is reflected in the conflicting names applied to specimens within the same BIN by different research groups.As the collection of global data increases, barcodes and BINs will play a vital role in integrating taxonomic feedback and significantly contributing to the standardization of names at the international level.This standardization is crucial for the sustainable management of the world's fisheries.

Format
for specific study types Human Subject Research (involving human participants and/or tissue) Give the name of the institutional review board or ethics committee that approved the study • Include the approval number and/or a statement indicating approval of this research • Indicate the form of consent obtained (written/oral) or the reason that consent was not obtained (e.g. the data were analyzed anonymously) • Animal Research (involving vertebrate animals, embryos or tissues) Provide the name of the Institutional Animal Care and Use Committee (IACUC) or other relevant ethics board that reviewed the study protocol, and indicate whether they approved this research or granted a formal waiver of ethical approval • Include an approval number if one was obtained • If the study involved non-human primates, add additional details about animal welfare and steps taken to ameliorate suffering • If anesthesia, euthanasia, or any kind of animal sacrifice is part of the study, include briefly which substances and/or methods were applied • Field Research Include the following details if this study involves the collection of plant, animal, or other materials from a natural setting: Field permit number • Name of the institution or relevant body that granted permission • Data Availability Authors are required to make all data underlying the findings described fully available, without restriction, and from the time of publication.PLOS allows rare exceptions to address legal and ethical concerns.See the PLOS Data Policy and FAQ for detailed information.Yes -all data are fully available without restriction Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation A Data Availability Statement describing where the data can be found is required at submission.Your answers to this question constitute the Data Availability Statement and will be published in the article, if accepted.Important: Stating 'data available on request from the author' is not sufficient.If your data are only available upon request, select 'No' for the first question and explain your exceptional situation in the text box.Do the authors confirm that all data underlying the findings described in their manuscript are fully available without restriction?Describe where the data may be found in full sentences.If you are copying our sample text, replace any instances of XXX with the appropriate details.

Fig 1 . 2 . 2
Fig 1. Map of sample origin areas.All the yellow dots correspond to the total areas collected along the coast.The points marked with A to E are Specific illustrations of the different sampling areas.

Fig 3 .
Fig 3. Barcoding gap: Maximum intraspecific Kimura 2-parameter (K2P) distances (A) and mean interspecific K2P distances (B) recorded in fish species from the coast of Mozambique.The graphs show the overlap of the maximum and mean intra-specific distances with the inter-specific (NN = nearest neighbour) distances.
1) and Durban (JF493648.1) in South Africa, and the detection of an additional possible lineage in southern Africa (Muhala et al, in preparation) suggests the need for further taxonomic examination of this group of stingrays

Table 1 . List of the 143 fish species, from Mozambican coast waters, which were DNA barcoded. 182 183
this study, we generated a final alignment with 419 sequences of the barcode region of COI gene with a length of 622 bp.Of these, we only used sequences with a length of 400 up to 622 bp for genetic analysis.The data correspond to 143 species, 104 genera, 59 families, 15 orders, and 4 unidentified taxa (Table1 and S1 Table).All samples were assessed morphologically and then subjected to DNA barcoding evaluation to confirm their identification.There were two classes (Teleost and Elasmobranch) of fish samples, including 392 and 27 respectively, which were acquired from local artisanal fishermen at landing sites and fish markets.For more details of the NJ tree clustering and composition see (S1 Fig).